Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval
نویسندگان
چکیده
Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represented in the form of matrices. The objective of this paper is to systematically analyze VSM, LSI and FCA for the task of IR using standard and real life datasets.
منابع مشابه
Information Retrieval as Semantics Transformation
In this paper we present Information Retrieval as a semantics transformation problem. We describe a general theory for deriving concepts, and discuss 2 special cases: the vector model and the set model. The vector model leads to concepts derived by latent semantic indexing using the singular value decomposition. The set model leads to Formal Concept Analysis. We discuss the relation between the...
متن کاملComparison of Information Retrieval Techniques: Latent Semantic Indexing and Concept Indexing
The task of information retrieval is to extract relevant documents for a certain query from the collection of documents. As large sets of documents are now increasingly common, there is a growing need for fast and efficient information retrieval algorithms. The algorithms we are dealing with are embedded in the vector space model. In this paper we compare two information retrieval techniques: l...
متن کاملA measure theoretic approach to information retrieval
The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterised by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning o...
متن کاملLatent Semantic Indexing Based on Factor Analysis
The main purpose of this paper is to propose a novel latent semantic indexing (LSI), statistical approach to simultaneously mapping documents and terms into a latent semantic space. This approach can index documents more effectively than the vector space model (VSM). Latent semantic indexing (LSI), which is based on singular value decomposition (SVD), and probabilistic latent semantic indexing ...
متن کاملDocument Clustering: Before and After the Singular Value Decomposition
Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval (IR) is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document which consists of a few terms. Therefore, in both clustering and retr...
متن کامل